RSDB: representative protein sequence databases have high information content
نویسندگان
چکیده
منابع مشابه
RSDB: representative protein sequence databases have high information content
MOTIVATION Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences can be removed before the databases start losing homology information. Can a database of sequence...
متن کاملProtein Databases
Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...
متن کاملProtein sequence databases.
A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves...
متن کاملRepresentative Protein Sequence and Structure Database
The database provides the information about the non-redundant protein dataset (1573 proteins) obtained from the Protein Data Bank. The information includes PDB ID, Length of the protein, Resolution, PDB Secondary structure, PDB secondary structure summary, PHD secondary structure prediction, PHD secondary structure prediction summary, sequence. We further revised the PDB Secondary structure sum...
متن کاملUniqueProt: creating representative protein sequence sets
UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2000
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/16.5.458